******************************************************
*               LAND FOR PRODUCTION                  *
* EVIDENCE FOR THE CANADIAN MANUFACTURING INDUSTRY   *
* ------------------------------------------------   *
*                 MASTRER DO-FILE                    *
*                      ------                        *
* This do-file summarizes the procedure usd to       *
* build our land dataset as well as the analyses     *
* performed on stata. It also reminds the procedures *
* performed on other sofwares : C+, Arcgis, Qgis     *
******************************************************

global workpath "/Users/kristianbehrens/Desktop/tokyo_todo/crsh_land_use/replication_archive"
global datapath "$workpath/data_raw"
global temppath "$workpath/temp"
global outpath "$workpath/data_processed"
global resultpath "$workpath/results"

* The following datapath contains the private Scott's All business directories data
* See the readme file for information on how to access that data
global privatedata "$datapath/private"

qui cap log using "$resultpath/_MASTER_log.smcl", replace
set more off

// The first two parts (part 1 and part 2 below) prepare the database used for the
// regression analysis.
// The third part (part 3 below) constructs all the tables for the paper

// ******************************************************************************
// NOTE: Unfortunately we cannot provide the regression database  as it contains
// commercial data, access to which must be negotiated with Scott's Business Directories  
// (https://www.scottsdirectories.com/). For those who can justify to us that 
// they have access via a subscription to the wbole Scott's Business Directories, 
// we will be happy to provide the regression database and the auxiliary files 
// required to build it.
// NOTE: Scott's maintains the most recent database, without possibility to download
// anymore old 'vintages'. We have the old vintages on our machines but cannot
// share them. However, as we said above, we can provide the data to subscribers
// of the Scott's Business Directories.
// ******************************************************************************


// Note: If you are interested in constructing the database, comment out the
// code between BEGIN COMMENTED OUT/END COMMENTED OUT
// If not, you can just run all the regressions below using the 
// database_for_reg.dta file located in /data_processed.
// To construct the database, you need access to the private data. Please see
// the readme file for futher information. Without this, you 


// BEGIN COMMENTED OUT
/*
// -------------------------------------------------------------------------
// PART 1 : PROCESSING THE SCOTTS DATASET (see do-file "scotts_global.do")
// -------------------------------------------------------------------------

// Much of the work here is GIS processing and geocoding.
// The following files are used as inputs
// 1) /data_raw/private/address_geo20012019.dta: the raw geocoding output from three
//    different procedures (DMTI address files, GoogleAPI with address but without firm name, 
//    and GoogleAPI with firm name)
// 2) /data_raw/private/address_da.dta: contains the geocoded information for the plants
//    including the dissemination area (DA) they are associated with

// Below are the different steps. Numbered steps (1, 2, ...) have corresponding dofiles. Lettered steps
// (A, B, ...) have no corresponding dofiles and correspond to (mostly manual) GIS work



// Step 1 : Pooling all the scotts datasets from the year 2001 to 2019 and creating the file of unique addresses
		// Input : 9 annual scotts datasets 2001; 2003; 2005; 2007; 2009; 2011; 2013; 2017; 2019
		// Method/Tools  : Appending files using stata 
		// Output: (1) pooled dataset with scotts from 2001-2019 :"scotts_global.dta" 

do "$workpath/dofiles/step1_scotts_global.do"


// Step 2 : Processing the unique addresses with their geocoded coordinates
		// Method/Tools  : 1st option of geocoding using ArGIS; 2nd options of geocoding using google API
		// merging files using stata
		// Input: File which contain the results from the three geocoding options : "address_geo20012019.dta
		// Output: (1) Final geocoded file with the best of the 3 geocoding options : address_geo2001_2017.dta

do "$workpath/dofiles/step2_geocode_address.do"		


// Step A : Collecting other plants location characteristics
		// Method/Tools : use ArcGIS to perform spatial join of unique addresses and dissemination areas polygons then use stata to convert the resulting file
		// Output: Geocoded unique addresses with other location characteristics(da_density, da_pop, economic region) : "data_raw/private/address_da.dta"
		// Auxiliary file: "dofiles/aux/stepA_location_char.do"
		// Creates the address_da.dta file, which is in the private data directory

// -------------------------------------------------------------------------
// PART 2 : PROCESSING THE POLYGON DATASETS AND PERFORMING THE SPATIAL JOIN
// -------------------------------------------------------------------------
	
// Step B : Harmonizing the various polygon datasets
		// Input : tens of polygon datasets of buildings, parcel, lidar
		// Method/Tools : Arcgis/Qgis to create new variables and harmonize measures and projection systems
		// Output: polygon datasets with 3 key variables : idsup; area_c; neighbor;
		// GIS WORK, no dofiles in this step

// STEP C: Prepare polygons
		// Auxiliary file: "dofiles/aux/stepC_shape2dta.do"
		// Output:  Three files, scotts_b.dta (building), scotts_p.dta (parcels), and scotts_h.dta (height)
		// GIS WORK, no dofiles in this step

// Step 3 : Spatial join
		// Input : polygon datasets; unique geocoded addresses (scotts_address.dta)
		// Method/Tools : Qgis to perform 3 options of spatial join to associate polygon with unique addresses, stata to append/merge files and choosing the best association option
		// Output : scotts plants with the best associated surface (scotts_processed.dta)
		
do "$workpath/dofiles/step3_spatial_join.do"

				
// Step D : Creating the land_varable and quality measures (convert building-parcel spatial join into dta)
		// Input : scotts plants with the best associated surface (scotts_processed.dta); check.dta; scotts_XY75501.dta; floorheight0.dta; bonp.dta, ponb.dta; zoning(scotts_lur.dta)
		// Method/Tools : Argis for spatial join of building and parcels; C+ and Arcgis to identify centers, stata to compute the land measures and quality variables
		// Output : Final dataset (scotts_final.dta)
		//do "/dofiles/aux/stepD_floor2area.do"  
		// Creates the buildings_on_parcel.dta and parcels_on_building.dta files 

		
// Step 4: Clustering
		// Input : DA level population data (in /clustering/raw)
		// Method/Tools : Uses Stata to prepare samples. Then, requires to run C++ code to create the clusters
		// and uses QGIS/Cartographica to prepare the polygons that are 'centers' and generates their centroids
		// Output : CMA centers for the analysis in 

do "$workpath/dofiles/step4_cluster.do"  	// prepare files to be used in C+ to identify town center

		
// Step 5: Finaly assembly of all the moving pieces into the unique database that we will
// use for all the regression analysis below

		
do "$workpath/dofiles/step5_scotts_final.do"
*/
// END COMMENTED OUT



// -------------------------------------------------------------------------
// PART 3 : Run all the regressions and produce all the graphs in the paper
// -------------------------------------------------------------------------



// ***************************************************************************************
// Table 1: Determinants of industrial density (establishment employment over parcel size)
// ***************************************************************************************

do "$workpath/dofiles/table1.do"

// ********************************************************************************************
// Table 2: Determinants of parcel coverage (establishment building footprint over parcel size)
// ********************************************************************************************

do "$workpath/dofiles/table2.do"

// ************************************************************************************
// Table 3: Robustness checks for sample selection, data quality, and CMA fixed effects
// ************************************************************************************

do "$workpath/dofiles/table3.do"

// ******************************************************************
// Table 4: Robustness checks using assessment roll data for Montréal
// ******************************************************************

do "$workpath/dofiles/table4.do"

// ****************************************************************************************************
// Table 5: Crowding with Montréal assessment roll data and our Canada-wide sample of ‘lowrise sectors’
// ****************************************************************************************************

do "$workpath/dofiles/table5.do"

// **************************************************
// Table 6: Decomposing the density distance gradient
// **************************************************

do "$workpath/dofiles/table6.do"

// *******************************************************************************
// Footnote 16: Location stickiness (controlling for growth of establishment size)
// *******************************************************************************

do "$workpath/dofiles/footnote16.do"

// *****************************************
// Table B4: Sample selection on observables
// *****************************************

do "$workpath/dofiles/tableB4.do"


// ***************************************************************
// Table C1: Employment over parcel size by NAICS 3-digit industry
// ***************************************************************

do "$workpath/dofiles/tableC1.do"

// ***********************************************************************
// Table C2: Building footprint over parcel size by NAICS 3-digit industry
// ***********************************************************************

do "$workpath/dofiles/tableC2.do"
 
// *******************************************************************************************
// Figure C1: Industrial density (establishment employment over parcel size) and establishment
// *******************************************************************************************

do "$workpath/dofiles/figureC1.do"
 
// ********************************************************************************************************
// Figure C2: Employment over parcel size and building footprint over parcel size by NAICS 3-digit industry
// ********************************************************************************************************

do "$workpath/dofiles/figureC2.do"
 
// ********************************************************************************************************
// Figure C3: Employment over parcel size and building footprint over parcel size by NAICS 3-digit industry
// ********************************************************************************************************

do "$workpath/dofiles/figureC3.do"


// ***********************************************************************
// Tables B1-B3: Additional tabulations
// Note: These dofiles tabulate information that can be retreived from the
//       log file; since this is very easy, we don't write output files 
// ***********************************************************************

do "$workpath/dofiles/tableB1.do"
do "$workpath/dofiles/tableB2.do"
do "$workpath/dofiles/tableB3.do"


qui cap log close
		
